-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Complex Dtype Support for Hashmap Algos #36482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: Luca Ionescu <[email protected]>
pandas/tests/test_complex.py
Outdated
@@ -0,0 +1,129 @@ | |||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any better locations for this test file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you split the tests to the appropriate files: pandas/tests/series/methods/test_value_counts.py
for example
do we have a list of what algorithms this is used for? if they are all unique/factorize-like, we might be able to do a |
This seems OK to me to start - could probably optimize later with things like |
I'm finding in a mostly-unrelated branch that having actual support for complex (in particular complex128) would be tremendously helpful. Just gentle encouragement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls rebase as well
pandas/tests/test_complex.py
Outdated
@@ -0,0 +1,129 @@ | |||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you split the tests to the appropriate files: pandas/tests/series/methods/test_value_counts.py
for example
Sure will take a look @jreback |
Sure which branch was this? This doesn’t offer complex128 support - but makes a lot of basic functional work. But I can perhaps look into that in a follow up @jbrockmendel |
fixing this up today. |
Good idea added some testing around this in Added testing for complex64 and complex128 in I think as a follow up we can address the inferred_type issue in complex indexes:
|
cc @realead if you can look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alimcmaster1 Please delete outdated comment, otherwise lgtm.
], | ||
) | ||
def test_value_counts_complex_numbers(self, input_array, expected): | ||
# Complex Index dtype is cast to object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment still valid? IIUC value_counts
uses complex128/256 and not objects, see
cpdef value_count(ndarray[htfunc_t] values, bint dropna): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dtype of the index will be objects, see below. Agree this probably needs fixing - I can create a follow up. As this is the same issue as your comment below refers too. #36482 (comment)
In [14]: pd.Series([1 + 1j, 1 + 1j, 1, 3j, 3j, 3j]).value_counts().index
Out[14]: Index([3j, (1+1j), (1+0j)], dtype='object')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls create a followon issue
[ | ||
( | ||
[1 + 1j, 0, 1, 1j, 1 + 2j, 1 + 2j], | ||
np.array([(1 + 1j), 0j, (1 + 0j), 1j, (1 + 2j)], dtype=object), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, dtype=object
here... I ask myself, whether we should have a unique
-function with fused types, like we already do for other functions e.g. value_counts
(
cpdef value_count(ndarray[htfunc_t] values, bint dropna): |
What do you think @jbrockmendel? Probably should not be part of this PR though. It looks like
dtype=object
for factorize
and Index
are just consequences of unique
returning objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel - whenever you have time, think your eyes on this would be much appreciated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yah i think ideally this should return a complex dtype (same for factorize above). OK for that to be a separate PR, can leave a comment on the test to that effect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great have added comments to that effect - will create a follow up issue
restarting CI test failures unrelated.
All comments addressed this should be good. |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Appreciated the review :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks a lot @alimcmaster1
if you can create the followon (checkboxes if needed) would be great
Ref: #18009
Based on #27599
First set of tests for complex number handling + sensible results from functions that rely on hash tables.
Use generic object hashing for now.
@jbrockmendel you interested in reviewing?